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Abstract. The standard Bayesian model formalism comparison cannot be applied to most 
cosmological models as they lack well-motivated parameter priors. However, if the data-set 
being used is separable then it is possible to use some of the data to obtain the necessary 
parameter distributions, the rest of the data being retained for model comparison. While such 
methods are not fully prescriptive, they provide a route to applying Bayesian model comparison 
in cosmological situations where it could not otherwise be used. 
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1. Introduction 

Much of observational cosmology can be thought of as an attempt to use astronomi¬ 
cal data to discriminate between the different cosmological models under consideration. 
Given both the inevitably imperfect data and the intrinsically stochastic nature of many 
cosmological measurements (be., cosmic variance), it is generally impossible to come to 
absolute conclusions about the various candidate models; the best that can be hoped for 
is to evaluate the probabilities, conditional on the the available data, that each of the 
candidate models is the correct description of the Universe. The fact that there is, as 
far as is known, just a single observable Universe (ie., there is no ensemble from which 
it has been drawn), means that such probabilities cannot be frequency-based, and must 
instead must represent a degree of implication. Self-consistency arguments then require 
(Cox 1946) that these probabilities be manipulated and inverted using Bayes’s theorem. 

Taken together, the above facts imply that Bayesian model comparison (Section 
should be used to assess how well different cosmological models explain the available 
data, although the fact that most such models have unspecified parameters is a significant 
difficulty for this approach (Section]^. This problem can be solved for separable data-sets 
as it is possible to use a two-step method of model comparison (Section]^, illustrated 
here with high-redshift supernova (SN) data (Section]^. 


2. Bayesian model comparison 

Given that one of a set of N models, {Mi, M 2 ,... Mjy}, is assumed to be true, the 
state of knowledge conditional on all the available (and relevant) information, I, is fully 
summarised by the probabilities Pr(Mi|/), Pr(M2|/), ..., Pr(M 7 v|d), where Pr(Mi|/) is 
the probability that the i’th model is correct (and i G {1,2,..., JV}). In the light of 
some new data, d, that has not already been included in the above probabilities, Bayes’s 
theorem gives the updated probability that model i is correct as 




Pr(M,|/) Pr(d|M„J) 
Ef=iPr(Mi|/)Pr(d|M„/)’ 


( 2 . 1 ) 


where Pr((i|Mi,/) is the marginal likelihood under model Mi. 
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If model Mi has Ni unspecified parameters {9i} = {9i^i,9i^2, ■ ■ ■, di,Ni} then the model- 
averaged likelihood is obtained by marginalising over these parameters to give 

Pr(d|M„/) = y’pr({0J|M„/)Pr(d|{dJ,M„/)d0,,id0,,2...d0,,Ar., (2.2) 

where Pr({0i}|Mi, I) is the prior distribution of the parameter values in this model. This 
expression demonstrates that the full specification of a model requires not just an explicit 
parameterisation, but a distribution for those parameters as well; two mathematically 
identical descriptions with different parameter priors are, in fact, different models. 


3. Comparison of models without parameter priors 

Equations |2.1| and |2.2| together summarise a self-consistent method for assessing which 
of a set of models is better supported by the available information, provided that the 
parameter priors for all the models are explicitly defined and unit-normalised. In par¬ 
ticular, while it is often possible to obtain sensible parameter constraints based on an 
improper prior, such as Pr{{9i}\Mi, I) constant for all {9i}, the resultant marginal likeli¬ 
hood is meaningless (Dickey 1961). Unfortunately, it is commonly the case in astronomy 
and cosmology that there is no compelling form for the models’ parameter priors and, 
further, that the natural uninformative prior distributions are improper and cannot be 
normalised. The apparent implication is that Bayesian model comparison, at least in the 
form described in Se ction cannot be used in cosmology, an idea that has been explored 
previously by, e.g., Efstathiou (2008')| and Jenkins & Peacock (2011) The disturbing 
corollary would be that there is no self-consistent method to choose between the avail¬ 
able cosmological models, even if they are completely quantitative and mathematically 
well-defined. 


4. Model comparison with separable data 

The idea that the relative degree of support for models with unspecified parameters is 
undefined is at odds with the marked - and data-driven - progress that has been made 
in cosmology over the last century. Clearly it is possible to use data to choose sensibly 
between models even if they do not have well-motivated parameter priors; but can this be 
formalised in a way that satishes Bayes’s theorem and is hence logically self-consistent? 

One possibility is, for separable data-sets (such as those which consist of measurements 
of many astronomical sources), to use some of the available data to obtain the necessary 
parameter priors and to then use the remaining data for model comparison. This is an 
old concept, dating back at least to Lempers (1971) and explored subsequently by, e.g., 
Spiegelhalter & Smith (1982) and O’Hagan (1995) The central idea is to partition the 
data as d = {di,d 2 ), with the first partition of training data used to obtain the (partial) 
posterior distribution for the parameters of i’th model as 


Pr({0J|di, M„J) = 


Pr({0J|M„/) Pr(di|{0J,M„/) 


/ Pr({d'}|A^., I) Pr(di |{d'}, M,, I) dd'^i d0',2 ... d9l ^^, 


(4.1) 


where Pr({di}|Mi, 7), which need not be normaliseable, should be a highly uninforma¬ 
tive prior. This posterior distribution can then be used as the prior needed to obtain a 
meaningful marginal likelihood, which can then be evaluated for the testing data as 


PT{d2\di,Mi,T) = 


Pr({ 6 >J|di, Mi, 7) Pr(d 2 |{ 6 'i}, Mi, 7) d 6 »i,i d 6 »i ,2 ■ ■ • d9i^N, ■ 


(4.2) 
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Figure 1. (left) The posterior distribution of 
1(1999)1 SCP SN data and a uniform prior with 


and Ha implied by the Perlmutter et al. 
^ 0. Highest posterior density contours 


enclosing 68.3%, 95.4% and 99.7% of the posterior probability are shown. Also shown are the 
prior distributions of the accelerating model and matter only model for Umax = 3. (right) The 
dependence of Pr(accel.|d, 7) on Umax, shown for different prior probabilities, Pr(accel.|7). 


This marginal likelihood is coherent, in the sense that it provides self-consistent up¬ 
dated posterior probabilities when inserted into Equation 2.1 but there is also ambiguity 
in how to partition the data: there is no compelling scheme for partitioning the data. It 
is tempting to average over the possible partitions, but this approach does not have a 
rigorous motivation. Despite these ambiguities, this two-step method of Bayesian model 
comparison for separable data does satisfy the Cox (1946) self-consistency requirements 
and so provide a means of calculating posterior probabilities for cosmological models 
with unspecified parameter priors. 


5. Example: late-time acceleration and snpernovae 

One of the most significant recent cosmological discoveries was that the Universe’s ex¬ 
pansion rate is increasing, a result which is often linked most strongly to the observations 
of distant SNe made by |Riess et al. (1998) and Perlmutter et al. (1999) The comparative 
faintness of the SNe, given their redshifts and light-curve decay timescales, indicated that 
the (normalised) cosmological constant, Oa, is sufficiently large to override the decelera¬ 
tion caused by the (normalised) matter density, Om- Riess et al. (1998) and [Perlmutter 
et al. (1999) used their SNe measurements, d, to obtain posterior distributions of the 
form Pr(UA, Umld, 7), under the assumption of uninformative (and improper) uniform 
priors of the form Pr(r2i„,UA) oc ©(ilm), where 0(a;) is the Heaviside step function. The 


posterior distribution for the 42 SCP SNe from Perlmutter et al. (1999), reproduced in 


Figure reveals that most of the models that are consistent with the data correspond 
to an accelerating universe (ie., Ha > Ui„/2). 

But do these data provide quantitive evidence of cosmological acceleration? Riess| 


et al. (1998) approached this question by calculating the fraction of the posterior with 
Ha > Hm/2, which is an apparently compelling 0.997 for the case shown in Figure 
The relevant Bayesian calculation (c./. Drell et al. 2000) should, however, be based on 
the marginal likelihoods of an accelerating model (for which the prior is non-zero only 
for Ha > Hi„/2) and a decelerating model (for which the obvious option is a matter-only 
model with Ha = 0). Such models can be fully specified (in the sense defined in Section]^ 
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Figure 2. The d istribution of Pr(accel.|d 2 , 1) obtained from different partitions of the Perlmut- 
|ter et al. (1999)|SN data set with training sets of 10 (left) and 21 (right) SNe. The opeii symbols 
indicate the prior values (of, from left to right, 0.01, 0.05, 0.1 and 0.5) and the solid symbols 
show the posterior values given by training and testing samples that alternate in redshift. 


by adding the restrictions that 0 ^ ^ fimax and 0 ^ ^ niin[Omax, i^A,BB(i^m)] 

(defined to reject models that did not begin with a Big Bang), where flmax ^ 0 is an un¬ 
specified “hyper-parameter”. Figureshows the dependence of the posterior probability 
of the accelerating model, Pr(accel.|c?, /), on ^max- Even the peak values of Pr(accel.|d, I) 
are considerably lower than the posterior fraction quoted above, and the dependence on 
the unknown value of ffmax is significant as well. 

Rather than introducing an arbitrary new parameter, another option is to adopt the 
two-step method described in Section using some of the SN data to obtain a partial 
posterior in flm and Ha for both the accelerating and matter-only models and then 
using the remainder to perform model comparison. The results of doing so are shown 
in Figure for several different partitioning options (and assuming the two models are 
equally probable a priori). These results again illustrate the standard Bayesian result 
that the better-htting accelerating model is not favoured so decisively over the more 
predictive {i.e., “simpler”) matter-only model, a result that is robust to prior choice. 

This two-step approach to model comparison could be applied to a variety of problems 
in astrophysics and cosmology {e.g., Bailer-Jones 2012, Khanin & Mortlock 2014). 
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